class: center, middle, inverse, title-slide #
Support Vector Machines
## 🚧 🛣️ 🦿 ### Applied Machine Learning in R
Pittsburgh Summer Methodology Series ### Lecture 4-A July 22, 2021 --- class: inverse, center, middle # Overview <style type="text/css"> .onecol { font-size: 26px; } .twocol { font-size: 24px; } .remark-code { font-size: 24px; border: 1px solid grey; } a { background-color: lightblue; } .remark-inline-code { background-color: white; } </style> --- class: onecol ## Roadmap .left-column[ .pt4[ <img src="data:image/png;base64,#../figs/map.jpg" width="100%" /> ] ] .right-column[ 1. Maximal Margin Classifier 🚶 1. Support Vector Classifier 🏃 1. Support Vector Machine 🚴 1. Support Vector Regression 🚵 1. Applied Example 1. Live Coding 1. Hands-on Activity ] --- class: onecol ## Notices and Disclaimers The ideas underlying SVM are really *clever* and *interesting*! 😃 -- SVM is also a good algorithm for *smaller*, *messy* datasets!! 😍 -- However, there is a lot of *terminology* and *math* involved... 😱 -- <p style="padding-top:15px;">I will try to shield you from this and give only <b>the necessities</b></p> - That means there will be some things I need to "hand waive" - I may also need to skip questions with very technical answers - Finally, it may be tough to follow along since training takes a while -- <p style="padding-top:15px;">But you should get a <b>strong intuition</b> and <b>applied knowledge</b></p> - This will prepare you nicely to dive into a longer course on the topic --- class: inverse, center, middle # SVM Intuitions --- class: onecol ## A Tale of Two Classes If this is our training data, how do we **predict the class** of new data? <img src="data:image/png;base64,#../figs/maxmargin1.png" width="100%" /> --- class: onecol ## Drawing a Line in the Sand With one feature, we could find a **point** that separates the classes (as higher or lower) <img src="data:image/png;base64,#../figs/maxmargin2.png" width="100%" /> --- class: onecol ## Analysis Paralysis But there are many possible decision points, so **which should we use?** <img src="data:image/png;base64,#../figs/maxmargin3.png" width="100%" /> --- class: onecol ## Maximal Margin Classifier (MMC) The MMC algorithm finds and uses the point with the **largest** (i.e., maximal)  <img src="data:image/png;base64,#../figs/maxmargin4.png" width="100%" /> --- class: onecol ## Maximal Margin Classifier If we have two features, we can extend this idea using a 2D plot and a decision **line** <img src="data:image/png;base64,#../figs/maxmargin5.png" width="80%" /> --- class: onecol ## Maximal Margin Classifier If we have three features, we will need a 3D plot and a decision **plane** (i.e., flat surface) <img src="data:image/png;base64,#../figs/3d_plane.gif" width="45%" /> .footnote[[1] Credit to [Zahra Elhamraoui](https://medium.datadriveninvestor.com/support-vector-machine-svm-algorithm-in-a-fun-easy-way-fc23a008c22) for this visualization.] --- class: onecol ## Maximal Margin Classifier If we have four or more features, we will need a decision **hyperplane** -- .bg-light-yellow.b--light-red.ba.bw1.br3.pl4[ **Caution:** You may hurt yourself if you try to imagine what a hyperplane looks like. ] -- .pt1[ **Margins still exist** in higher-dimensional space and we still want to maximize them ] - Our goal is thus to locate the class-separating hyperplane with the largest margin - The math behind this is beyond the scope of our workshop, but that's the idea -- .pt1[ We can still **classify new observations**: which side of the hyperplane do they fall on? ] --- class: onecol ## Maximal Margin Classifier Only the observations that define the margin, called , are used Because it only uses a subset of data anyway, MMC does well with smaller datasets <img src="data:image/png;base64,#../figs/maxmargin8.png" width="80%" /> --- class: onecol ## Maximal Margin Classifier This means that **outliers can have an outsized impact** on what is learned For instance, this margin is likely to misclassify examples in new data <img src="data:image/png;base64,#../figs/maxmargin9.png" width="80%" /> --- ## Comprehension Check \#1 .pull-left[ ### Question 1 **What is the maximum number of features you can use with the MMC algorithm?** a) 1 feature b) 2 features c) 3 features d) Any number (no maximum) ] .pull-right[ ### Question 2 **Which observations does the MMC algorithm learn from?** a) The most ambiguous/difficult ones b) The most obvious/easy ones c) A random selection of observations d) All of the observations ] --- class: onecol ## Support Vector Classifier (SVC) The SVC algorithm is like MMC but it **allows examples to be misclassified** (i.e., wrong) This will **increase bias** (training errors) but hopefully **decrease variance** (testing errors) <img src="data:image/png;base64,#../figs/svc1.png" width="80%" /> --- class: onecol ## Support Vector Classifier SVCs also enable a model to be trained when the classes are **not perfectly separable** A straight line is never going to separate these classes without errors (sorry MMC...) <img src="data:image/png;base64,#../figs/svc2.png" width="80%" /> --- class: onecol ## Support Vector Classifier But if we allow a few errors and points within the margin... ...we may be able to find a hyperplane that generalizes pretty well <img src="data:image/png;base64,#../figs/svc3.png" width="80%" /> --- class: onecol ## Support Vector Classifier When points are on the wrong side of the margin, they are called "violations" A **softer margin** allows more violations, whereas a **harder margin** allows fewer SVCs have a hyperparameter `\(C\)` that controls how soft vs. hard the margin is -- - A **lower `\(C\)` value** makes the margin harder (allows fewer violations)<sup>1</sup> As a result, the model has **lower bias** and more flexibility but may overfit - A **higher `\(C\)` value** makes the margin softer (allows more violations) As a result, the model has less flexibility but may also have **lower variance** .footnote[ [1] If you set `\(C=0\)` (i.e., a fully hard margin) SVC will allow no violations and behave the same as MMC.] --- ## Comprehension Check \#2 .pull-left[ ### Question 1 **What is the main difference between the MMC and SVC algorithms?** a) The MMC algorithm uses hyperplanes b) The SVC algorithm allows training errors c) The MMC algorithm uses a softer margin d) The SVC algorithm prevents "violations" ] .pull-right[ ### Question 2 **Which value of `\(C\)` would be least likely to overfit?** a) `\(C=-1\)` b) `\(C=0\)` c) `\(C=1\)` d) `\(C=10\)` ] --- class: onecol ## Support Vector Machine So far, MMC and SVCs have both used linear (e.g., flat) hyperplanes But there are many times when the classes are not **linearly separable** <img src="data:image/png;base64,#../figs/svm1.png" width="80%" /> .footnote[[1] These classes seem separable, but not with a single decision point...] --- class: onecol ## Support Vector Machine But if we enlarge the feature space, the classes might then become linearly separable There are many ways to do this enlarging, but one is to add polynomial expansions <img src="data:image/png;base64,#../figs/svm2.png" width="80%" /> --- class: onecol ## Support Vector Machine The classes are now linearly separable in this new enlarged feature space! The hyperplane is flat in this new space, but would not look so in the original space <img src="data:image/png;base64,#../figs/svm3.png" width="80%" /> --- class: onecol ## Support Vector Machine Here is a more complex example of a nonlinear (and non-polynomial) expansion <img src="data:image/png;base64,#../figs/svm4.png" width="75%" /> .footnote[[1] Credit to [Erik Kim](https://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html) for this example and visualization.] --- class: onecol ## Support Vector Machine And here is the hyperplane: it is linear in 3D but not when "projected" back into 2D <img src="data:image/png;base64,#../figs/svm5.png" width="75%" /> .footnote[[1] Credit to [Erik Kim](https://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html) for this example and visualization.] --- class: onecol ## Support Vector Machine The  (SVM) allows us to efficiently enlarge the feature space - Part of what makes SVMs efficient is they **only consider the support vectors** - They also use **kernel functions** to quantify the similarity of pairs of support vectors<sup>1</sup> .footnote[[1] These similarity estimates are used to efficiently find the optimal hyperplane but that process is complex.] -- <p style="padding-top:15px;">The SVC can actually be considered a simple version of the SVM with a <b>linear kernel</b></p> - A linear kernel essentially quantifies similarity using the Pearson correlation `$$k(x, x') = \langle x, x'\rangle$$` -- <p style="padding-top:15px;">Linear kernels are efficient but <b>nonlinear kernels</b> may provide better performance</b></p> --- exclude: true class: onecol ## Support Vector Machine It is common to also use **nonlinear** kernels, such as the  `$$k(x, x')=(\text{scale} \cdot \langle x, x' \rangle + \text{offset})^\text{degree}$$` With larger values of `\(\text{degree}\)`, the decision boundary can become more complex - You are essentially adding polynomial expansions of `\(\text{degree}\)` to each predictor - You have expanded the feature space and may now have linear separation - This is the same idea we just used in fitting a hyperplane in the `\(x\text{-by-}x^2\)` space! If you center or normalize all your predictors, you can drop the `\(\text{offset}\)` term .footnote[[1] When `\\(\text{degree}=1\\)`, the polynomial kernel reduces to the linear kernel and SVM becomes SVC again.] --- class: onecol ## Support Vector Machine Perhaps the most common nonlinear kernel is the  (RBF) `$$k(x, x') = \exp\left(-\sigma \|x-y\|^2\right)$$` -- The intuition here is that similarity is weighted by how *close* the observations are - Only support vectors near new observations influence classification strongly - As the `\(\sigma\)` hyperparameter<sup>1</sup> increases, the more *local* and complex fit becomes -- The RBF kernel computes similarity between points in *infinite* dimensions 🤯 - It is considered an ideal "general purpose" nonlinear kernel<sup>2</sup> -- .footnote[ [1] Note that `\(\sigma\)` is also sometimes called `\(\gamma\)` or the "scale" hyperparameter.<br /> [2] Other nonlinear kernels are popular for special purposes and in specialized subfields. ] --- ## Comprehension Check \#3 .pull-left[ ### Question 1 **What is the main difference between the SVM and SVC algorithms?** a) The SVM algorithm uses kernel functions b) SVM will automatically square your features c) SVC is just the SVM with an RBF kernel d) They are the same exact thing, silly! ] .pull-right[ ### Question 2 **What are the hyperparameters for an SVM with a Radial kernel?** a) `\(\sigma\)` and `\(\gamma\)` b) `\(C\)` and `\(\sigma\)` c) `\(\alpha\)` and `\(\lambda\)` d) `\(C\)` and `\(\Delta\)` ] --- class: onecol ## Support Vector Regression Machine (SVR) So far we have been concerned only with classification, but what about regression? In another very clever turn, you can **adapt SVM for regression** by reversing its goal - Instead of trying to separate classes and keep data points outside of the margin... - ...SVR instead tries to **keep all the data points**  (with no classes) .center[ <img src="data:image/png;base64,#../figs/svr.png" width="50%" /> ] --- class: onecol ## Support Vector Regression Machine All the benefits of the SVM algorithm can be applied to the regression problem - Using a subset of the data points to define the margin, i.e., **support vectors** - Allowing violations (now points *outside* the margin) with a **soft margin** - Efficiently expanding the feature space through the use of **various kernels** We can use the same {caret} `method` and hyperparameters for SVM and SVR - We will use `\(C\)` to control the softness vs. hardness of the margin - If we use an RBF kernel, we will use `\(\sigma\)` to control the kernel scaling - Optionally, we can tune the `\(\varepsilon\)` hyperparameter to control the margin size --- class: onecol ## Support Vector Regression Machine <img src="data:image/png;base64,#../figs/epsilonsvr.png" width="90%" /> --- class: onecol ## Applied Example Let's train three classification models to predict Titanic survival 1. A logistic regression (using "glm") 1. A linear SVM (using "svmLinear") 1. A radial SVM (using "svmRadial") We can practice setting up tuning grids and also learn to compare models<sup>1</sup> We will also show using CV only (and not nesting it inside of a holdout CV)<sup>2</sup> Finally, we will train an SVR regression model to predict Titanic fare .footnote[ [1] Note that this is a relatively small dataset with few predictors, so we may not really need the power of SVM.<br /> [2] Note that nesting a `\(k\)`-fold CV inside another `\(k\)`-fold CV would give better estimates (but take even longer).] --- class: onecol ## Applied Example ```r titanic <- read_csv("https://bit.ly/amlr-titanic") ``` -- ```r # Create the recipe and configure resampling surv_recipe <- titanic %>% recipe(survived ~ .) %>% step_normalize(all_numeric_predictors()) %>% step_dummy(all_nominal_predictors()) surv_tc <- trainControl(method = "cv", number = 10) ``` --- class: onecol ## Training the Logistic Regression and CV Performance ```r set.seed(2021) # glm has no hyperparameters surv_glm <- train( surv_recipe, * data = titanic, method = "glm", trControl = surv_tc ) ``` ```r surv_glm$results ``` <table> <thead> <tr> <th style="text-align:left;"> parameter </th> <th style="text-align:right;"> Accuracy </th> <th style="text-align:right;"> Kappa </th> <th style="text-align:right;"> AccuracySD </th> <th style="text-align:right;"> KappaSD </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> none </td> <td style="text-align:right;"> 0.791 </td> <td style="text-align:right;"> 0.562 </td> <td style="text-align:right;"> 0.065 </td> <td style="text-align:right;"> 0.135 </td> </tr> </tbody> </table> --- class: onecol ## Training the Linear SVM ```r set.seed(2021) # Let's create a tuning grid manually surv_linsvm <- train( surv_recipe, data = titanic, * method = "svmLinear", * tuneGrid = expand.grid(C = c(1/4, 1/2, 1, 2, 4)), trControl = surv_tc ) ``` --- class: onecol ## CV Performance of the Linear SVM ```r surv_linsvm$results ``` <table> <thead> <tr> <th style="text-align:right;"> C </th> <th style="text-align:right;"> Accuracy </th> <th style="text-align:right;"> Kappa </th> <th style="text-align:right;"> AccuracySD </th> <th style="text-align:right;"> KappaSD </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.25 </td> <td style="text-align:right;"> 0.796 </td> <td style="text-align:right;"> 0.57 </td> <td style="text-align:right;"> 0.058 </td> <td style="text-align:right;"> 0.119 </td> </tr> <tr> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 0.796 </td> <td style="text-align:right;"> 0.57 </td> <td style="text-align:right;"> 0.058 </td> <td style="text-align:right;"> 0.119 </td> </tr> <tr> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.796 </td> <td style="text-align:right;"> 0.57 </td> <td style="text-align:right;"> 0.058 </td> <td style="text-align:right;"> 0.119 </td> </tr> <tr> <td style="text-align:right;"> 2.00 </td> <td style="text-align:right;"> 0.796 </td> <td style="text-align:right;"> 0.57 </td> <td style="text-align:right;"> 0.058 </td> <td style="text-align:right;"> 0.119 </td> </tr> <tr> <td style="text-align:right;"> 4.00 </td> <td style="text-align:right;"> 0.796 </td> <td style="text-align:right;"> 0.57 </td> <td style="text-align:right;"> 0.058 </td> <td style="text-align:right;"> 0.119 </td> </tr> </tbody> </table> --- class: onecol ## Training the Radial SVM ```r set.seed(2021) # Let's let caret set up the tuning grid on this one surv_rbfsvm <- train( surv_recipe, data = titanic, * method = "svmRadialSigma", * tuneLength = 5, trControl = surv_tc ) ``` --- class: onecol ## CV Performance of the Radial SVM ```r surv_rbfsvm$results ``` <div style="border: 1px solid #ddd; padding: 0px; overflow-y: scroll; height:400px; "><table> <thead> <tr> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> sigma </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> C </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Accuracy </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> Kappa </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> AccuracySD </th> <th style="text-align:right;position: sticky; top:0; background-color: #FFFFFF;"> KappaSD </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.045 </td> <td style="text-align:right;"> 0.25 </td> <td style="text-align:right;"> 0.806 </td> <td style="text-align:right;"> 0.588 </td> <td style="text-align:right;"> 0.051 </td> <td style="text-align:right;"> 0.106 </td> </tr> <tr> <td style="text-align:right;"> 0.045 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 0.806 </td> <td style="text-align:right;"> 0.588 </td> <td style="text-align:right;"> 0.049 </td> <td style="text-align:right;"> 0.103 </td> </tr> <tr> <td style="text-align:right;"> 0.045 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.807 </td> <td style="text-align:right;"> 0.590 </td> <td style="text-align:right;"> 0.050 </td> <td style="text-align:right;"> 0.103 </td> </tr> <tr> <td style="text-align:right;"> 0.045 </td> <td style="text-align:right;"> 2.00 </td> <td style="text-align:right;"> 0.802 </td> <td style="text-align:right;"> 0.574 </td> <td style="text-align:right;"> 0.048 </td> <td style="text-align:right;"> 0.098 </td> </tr> <tr> <td style="text-align:right;"> 0.045 </td> <td style="text-align:right;"> 4.00 </td> <td style="text-align:right;"> 0.796 </td> <td style="text-align:right;"> 0.555 </td> <td style="text-align:right;"> 0.049 </td> <td style="text-align:right;"> 0.102 </td> </tr> <tr> <td style="text-align:right;"> 0.154 </td> <td style="text-align:right;"> 0.25 </td> <td style="text-align:right;"> 0.799 </td> <td style="text-align:right;"> 0.560 </td> <td style="text-align:right;"> 0.047 </td> <td style="text-align:right;"> 0.098 </td> </tr> <tr> <td style="text-align:right;"> 0.154 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 0.800 </td> <td style="text-align:right;"> 0.562 </td> <td style="text-align:right;"> 0.049 </td> <td style="text-align:right;"> 0.101 </td> </tr> <tr> <td style="text-align:right;"> 0.154 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.805 </td> <td style="text-align:right;"> 0.575 </td> <td style="text-align:right;"> 0.046 </td> <td style="text-align:right;"> 0.096 </td> </tr> <tr> <td style="text-align:right;"> 0.154 </td> <td style="text-align:right;"> 2.00 </td> <td style="text-align:right;"> 0.817 </td> <td style="text-align:right;"> 0.605 </td> <td style="text-align:right;"> 0.049 </td> <td style="text-align:right;"> 0.102 </td> </tr> <tr> <td style="text-align:right;"> 0.154 </td> <td style="text-align:right;"> 4.00 </td> <td style="text-align:right;"> 0.809 </td> <td style="text-align:right;"> 0.588 </td> <td style="text-align:right;"> 0.047 </td> <td style="text-align:right;"> 0.098 </td> </tr> <tr> <td style="text-align:right;"> 0.262 </td> <td style="text-align:right;"> 0.25 </td> <td style="text-align:right;"> 0.796 </td> <td style="text-align:right;"> 0.553 </td> <td style="text-align:right;"> 0.053 </td> <td style="text-align:right;"> 0.111 </td> </tr> <tr> <td style="text-align:right;"> 0.262 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 0.807 </td> <td style="text-align:right;"> 0.580 </td> <td style="text-align:right;"> 0.045 </td> <td style="text-align:right;"> 0.092 </td> </tr> <tr> <td style="text-align:right;"> 0.262 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.809 </td> <td style="text-align:right;"> 0.587 </td> <td style="text-align:right;"> 0.047 </td> <td style="text-align:right;"> 0.098 </td> </tr> <tr> <td style="text-align:right;"> 0.262 </td> <td style="text-align:right;"> 2.00 </td> <td style="text-align:right;"> 0.809 </td> <td style="text-align:right;"> 0.588 </td> <td style="text-align:right;"> 0.048 </td> <td style="text-align:right;"> 0.101 </td> </tr> <tr> <td style="text-align:right;"> 0.262 </td> <td style="text-align:right;"> 4.00 </td> <td style="text-align:right;"> 0.802 </td> <td style="text-align:right;"> 0.572 </td> <td style="text-align:right;"> 0.044 </td> <td style="text-align:right;"> 0.092 </td> </tr> <tr> <td style="text-align:right;"> 0.371 </td> <td style="text-align:right;"> 0.25 </td> <td style="text-align:right;"> 0.801 </td> <td style="text-align:right;"> 0.564 </td> <td style="text-align:right;"> 0.050 </td> <td style="text-align:right;"> 0.105 </td> </tr> <tr> <td style="text-align:right;"> 0.371 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 0.810 </td> <td style="text-align:right;"> 0.589 </td> <td style="text-align:right;"> 0.047 </td> <td style="text-align:right;"> 0.099 </td> </tr> <tr> <td style="text-align:right;"> 0.371 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.811 </td> <td style="text-align:right;"> 0.593 </td> <td style="text-align:right;"> 0.047 </td> <td style="text-align:right;"> 0.099 </td> </tr> <tr> <td style="text-align:right;"> 0.371 </td> <td style="text-align:right;"> 2.00 </td> <td style="text-align:right;"> 0.801 </td> <td style="text-align:right;"> 0.571 </td> <td style="text-align:right;"> 0.047 </td> <td style="text-align:right;"> 0.099 </td> </tr> <tr> <td style="text-align:right;"> 0.371 </td> <td style="text-align:right;"> 4.00 </td> <td style="text-align:right;"> 0.799 </td> <td style="text-align:right;"> 0.568 </td> <td style="text-align:right;"> 0.047 </td> <td style="text-align:right;"> 0.098 </td> </tr> <tr> <td style="text-align:right;"> 0.479 </td> <td style="text-align:right;"> 0.25 </td> <td style="text-align:right;"> 0.802 </td> <td style="text-align:right;"> 0.567 </td> <td style="text-align:right;"> 0.050 </td> <td style="text-align:right;"> 0.104 </td> </tr> <tr> <td style="text-align:right;"> 0.479 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 0.808 </td> <td style="text-align:right;"> 0.585 </td> <td style="text-align:right;"> 0.048 </td> <td style="text-align:right;"> 0.100 </td> </tr> <tr> <td style="text-align:right;"> 0.479 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 0.809 </td> <td style="text-align:right;"> 0.588 </td> <td style="text-align:right;"> 0.047 </td> <td style="text-align:right;"> 0.100 </td> </tr> <tr> <td style="text-align:right;"> 0.479 </td> <td style="text-align:right;"> 2.00 </td> <td style="text-align:right;"> 0.801 </td> <td style="text-align:right;"> 0.572 </td> <td style="text-align:right;"> 0.046 </td> <td style="text-align:right;"> 0.095 </td> </tr> <tr> <td style="text-align:right;"> 0.479 </td> <td style="text-align:right;"> 4.00 </td> <td style="text-align:right;"> 0.787 </td> <td style="text-align:right;"> 0.544 </td> <td style="text-align:right;"> 0.047 </td> <td style="text-align:right;"> 0.097 </td> </tr> </tbody> </table></div> --- class: onecol ## Plotting Accuracy Against `\(C\)` and `\(\sigma\)` Values ```r ggplot(surv_rbfsvm) ``` <img src="data:image/png;base64,#Day_4A_Slides_files/figure-html/rbfhp-1.png" width="90%" /> --- ## Comparing the Algorithms' CV Performance ```r surv_res <- resamples(list(LogisticGLM = surv_glm, LinearSVM = surv_linsvm, RadialSVM = surv_rbfsvm)) summary(surv_res, metric = "Accuracy") #> #> Call: #> caret:::summary.resamples(object = x, metric = metric) #> #> Models: LogisticGLM, LinearSVM, RadialSVM #> Number of resamples: 10 #> #> Accuracy #> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's #> LogisticGLM 0.670 0.763 0.788 0.791 0.844 0.885 0 #> LinearSVM 0.691 0.776 0.804 0.796 0.833 0.875 0 #> RadialSVM 0.722 0.794 0.814 0.817 0.850 0.885 0 ``` --- ## Comparing the Algorithms' CV Performance ```r bwplot(surv_res) ``` <img src="data:image/png;base64,#Day_4A_Slides_files/figure-html/unnamed-chunk-30-1.png" width="100%" /> --- class: onecol ## Quick SVR Example ```r svr_recipe <- titanic %>% recipe(fare ~ .) %>% step_rm(survived) %>% step_dummy(all_nominal_predictors()) %>% step_normalize(all_numeric_predictors()) ``` --- ## Quick SVR Example ```r fare_svr <- train( svr_recipe, data = titanic, method = "svmRadial", tuneLength = 5, trControl = surv_tc ) fare_svr$results ``` <table> <thead> <tr> <th style="text-align:right;"> sigma </th> <th style="text-align:right;"> C </th> <th style="text-align:right;"> RMSE </th> <th style="text-align:right;"> Rsquared </th> <th style="text-align:right;"> MAE </th> <th style="text-align:right;"> RMSESD </th> <th style="text-align:right;"> RsquaredSD </th> <th style="text-align:right;"> MAESD </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 0.224 </td> <td style="text-align:right;"> 0.25 </td> <td style="text-align:right;"> 42.759 </td> <td style="text-align:right;"> 0.526 </td> <td style="text-align:right;"> 17.037 </td> <td style="text-align:right;"> 13.993 </td> <td style="text-align:right;"> 0.116 </td> <td style="text-align:right;"> 3.168 </td> </tr> <tr> <td style="text-align:right;"> 0.224 </td> <td style="text-align:right;"> 0.50 </td> <td style="text-align:right;"> 41.102 </td> <td style="text-align:right;"> 0.552 </td> <td style="text-align:right;"> 16.594 </td> <td style="text-align:right;"> 13.815 </td> <td style="text-align:right;"> 0.120 </td> <td style="text-align:right;"> 3.014 </td> </tr> <tr> <td style="text-align:right;"> 0.224 </td> <td style="text-align:right;"> 1.00 </td> <td style="text-align:right;"> 39.483 </td> <td style="text-align:right;"> 0.570 </td> <td style="text-align:right;"> 16.186 </td> <td style="text-align:right;"> 13.716 </td> <td style="text-align:right;"> 0.131 </td> <td style="text-align:right;"> 2.905 </td> </tr> <tr> <td style="text-align:right;"> 0.224 </td> <td style="text-align:right;"> 2.00 </td> <td style="text-align:right;"> 38.731 </td> <td style="text-align:right;"> 0.567 </td> <td style="text-align:right;"> 15.993 </td> <td style="text-align:right;"> 13.463 </td> <td style="text-align:right;"> 0.134 </td> <td style="text-align:right;"> 2.773 </td> </tr> <tr> <td style="text-align:right;"> 0.224 </td> <td style="text-align:right;"> 4.00 </td> <td style="text-align:right;"> 38.696 </td> <td style="text-align:right;"> 0.554 </td> <td style="text-align:right;"> 16.054 </td> <td style="text-align:right;"> 13.145 </td> <td style="text-align:right;"> 0.129 </td> <td style="text-align:right;"> 2.585 </td> </tr> </tbody> </table> --- ## Quick SVR Example ```r densityplot(fare_svr, metric = "Rsquared") ``` <img src="data:image/png;base64,#Day_4A_Slides_files/figure-html/unnamed-chunk-33-1.png" width="100%" /> --- class: onecol ## Live Coding Let's try to predict whether it is safe or unsafe for humans to drink from a water source based on its chemical properties (e.g., pH, hardness, sulfate content). We will recap how to **train SVM classifiers** with RBF kernels We will also compare **two approaches** to selecting the `\(\sigma\)` hyperparameter: 1. Tuning it via grid search (i.e., trying and comparing many values) 1. Letting the {kernlab} package automatically estimate it Finally, I can also show you an example of **imputing missing values** with {recipes} --- class: onecol ## Hands-on Activity Let's try to predict whether an employee with quit their job based on various measures. 1. Load the `attrition` dataset<sup>1</sup> from the {modeldata} package (Hint: use `data(...)`) 1. Explore the data and create a recipe that includes normalization and dummy coding 1. Configure your resampling to be a single 10-fold cross-validation 1. Train models using the following algorithms and settings to classify `Potability` + GLMNET with a `tuneLength` of 5 for `\(\alpha\)` and `\(\lambda\)` + SVM with RBF kernel with a `tuneLength` of 5 for `\(C\)` (let {kernlab} estimate `\(\sigma\)`) 1. Collect the resampling results using `resamples()` and plot them using `bwplot()` .footnote[[1] Note that this is simulated (i.e., not real) data from the IBM Watson group.] --- class: inverse, center, middle # Time for a Break!
10
:
00